Skip to content

feat: [metrics] Create metrics framework and add connection metrics#407

Open
charlesdong1991 wants to merge 11 commits intoapache:mainfrom
charlesdong1991:metrics-framework
Open

feat: [metrics] Create metrics framework and add connection metrics#407
charlesdong1991 wants to merge 11 commits intoapache:mainfrom
charlesdong1991:metrics-framework

Conversation

@charlesdong1991
Copy link
Contributor

Purpose

Linked issue: close #390

Brief change log

This PR adds a metrics framework and connection level metrics.

NOTE:
There is an parity from Java, which is the scopeguard for in-flight on cancellation, because Java callback model doesn't have equivalent to tokio future cancellation. And if a tokio future is dropped mid-execution, it would skip decrement and cause gauge drift. So i think it's important to add "scopeguard" to avoid that.

And I can have writer and scanner metrics as follow-up PRs.

Tests

All tests are passed locally.

Copy link
Contributor

@fresh-borzoni fresh-borzoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charlesdong1991 Ty for the PR, left some comments
PTAL

@charlesdong1991
Copy link
Contributor Author

Thanks for reviews, i made some changes and leave a question. PTAL @fresh-borzoni

Copy link
Contributor

@fresh-borzoni fresh-borzoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charlesdong1991 LGTM overall, small comment about misleading doc, and we can do simple math to match Java here, at least semantically. PTAL

@charlesdong1991
Copy link
Contributor Author

and we can do simple math to match Java here, at least semantically. PTAL

btw, just to clarify before making changes, do you mean to update code to fix header overhead? or to update the doc to reflect this? @fresh-borzoni

@fresh-borzoni
Copy link
Contributor

fresh-borzoni commented Mar 5, 2026

and we can do simple math to match Java here, at least semantically. PTAL

btw, just to clarify before making changes, do you mean to update code to fix header overhead? or to update the doc to reflect this? @fresh-borzoni

Sorry, I should have been more specific - I meant code @charlesdong1991

@charlesdong1991
Copy link
Contributor Author

Gotcha, thanks @fresh-borzoni will do a quick change, should be straightforward 👍

@charlesdong1991
Copy link
Contributor Author

i updated code to calculate total size "manually", PTAL @fresh-borzoni

thanks again!

@fresh-borzoni
Copy link
Contributor

@charlesdong1991 TY, LGTM

@charlesdong1991
Copy link
Contributor Author

would be great if you can take a look! Thanks! @luoyuxia

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a client-side metrics framework (via the metrics facade) and instruments RPC connection requests to emit per-API-key counters, gauges, and latency histograms (issue #390).

Changes:

  • Introduces crates/fluss/src/metrics.rs with metric name constants and an api_key label filter for reportable APIs.
  • Instruments ServerConnectionInner::request to record request/response counts, bytes sent/received, in-flight gauge, and request latency (with drop-based cleanup on cancellation).
  • Adds unit/integration-style tests around connection metrics emission and failure paths; wires in metrics/metrics-util dependencies.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
crates/fluss/src/rpc/server_connection.rs Emits connection-level metrics around request lifecycle; adds tests validating metrics behavior (success, send failure, API error).
crates/fluss/src/rpc/mod.rs Re-exports ApiKey as pub(crate) to support metrics helpers.
crates/fluss/src/rpc/message/header.rs Exposes REQUEST_HEADER_LENGTH for consistent byte accounting.
crates/fluss/src/metrics.rs New metrics constants + api_key label helper and tests.
crates/fluss/src/lib.rs Exposes metrics module from the crate root.
crates/fluss/Cargo.toml Adds metrics dependency and metrics-util for dev/test support.
Cargo.toml Adds workspace-level metrics dependency version.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +229 to +253
fn begin(api_key: crate::rpc::ApiKey, request_bytes: u64) -> Self {
let label = crate::metrics::api_key_label(api_key);
if let Some(label) = label {
// Match Java semantics: count request attempts before write/send.
metrics::counter!(
crate::metrics::CLIENT_REQUESTS_TOTAL,
crate::metrics::LABEL_API_KEY => label
)
.increment(1);
metrics::counter!(
crate::metrics::CLIENT_BYTES_SENT_TOTAL,
crate::metrics::LABEL_API_KEY => label
)
.increment(request_bytes);
metrics::gauge!(
crate::metrics::CLIENT_REQUESTS_IN_FLIGHT,
crate::metrics::LABEL_API_KEY => label
)
.increment(1.0);
}
Self {
label,
start: Instant::now(),
completed: false,
}
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RequestMetricsLifecycle::begin always captures Instant::now() and allocates a lifecycle object even when the API key is non-reportable (label == None). Since the start timestamp is only used for latency/histogram on labeled series, consider skipping Instant::now() (e.g., store start: Option<Instant> or return Option<RequestMetricsLifecycle> from begin) to avoid adding overhead to admin/metadata/auth requests that intentionally do not emit metrics.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it matches java behaviour now as it captures start time at inflight creation regardless of reportability.

also not sure if it is an big issue honestly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charlesdong1991 not a big deal, we may ignore

@charlesdong1991
Copy link
Contributor Author

can i get another round of reviews? @luoyuxia @fresh-borzoni thanks!

Copy link
Contributor

@fresh-borzoni fresh-borzoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charlesdong1991 Looked through one more time, LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Client metrics framework

3 participants